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In a recent letter [Q, Mantegna et. al. report that 
certain statistical signatures of natural language can 
be found in non-coding DNA sequences. The vast 
majority of DNA in higher organisms including hu- 
mans consists of non-coding sequences whose func- 
tion, if any, is unknown. Hence this new analysis 
is quite important. It suggests, as the authors con- 
cluded, "the possible existence of one (or more than 
one) structured biological language(s) present in non- 
coding DNA sequence". Previous work from this 
group and others also showed that DNA sequences 
have long-range power-law correlations Q || , which 
are found in non-coding regions but not in coding 
regions 0. 

Since disorder or randomness dominates many nat- 
ural phenomena which exhibit power-law correlations 
it is reasonable to ask whether such correlations alone 
can produce the statistical features attributed to the 
presence of language. Here we show that random 
noise with power-law correlations, similar to the ubiq- 
uitous "1/f" noise, exhibits the same "linguistic" sta- 
tistical signatures reported in ref. 1 for non-coding 
DNA. We conclude that these signatures by them- 
selves cannot distinguish language from noise. 

As in ref. 1 we carried out the Zipf analysis of 
"word" frequency vs. rank as well as the Shannon 
analysis of redundancy. Noise with spectral density 
of the form S(f) ~ f~P was analyzed. The exponent 
(3 can be related to the exponent a used in the DNA 
walk correlation analysis by (3 = 2a — 1 . Q| Noise 
was synthesized by numerically filtering white-noise 
with a power-law filtering function. The white noise 
was derived from either a gaussian random number 
generator or from amplified thermal noise from a 
1 Mil resistor, with no difference in outcome. We 
also analyzed 1/f noise from a Josephson junction, 
which had the same statistical behavior as synthe- 
sized noise. Signals were binned into four ampli- 
tude ranges with equal weight and assigned values 



0—3 so as to provide a four letter "alphabet" , like that 
of DNA. This method is equivalent to sampling with 
a 2-bit analog-to-digital converter. The analysis con- 
sisted of sampling contiguous blocks of length n (an 
n-tuple). The n-point sampling window was sequen- 
tially shifted by one point until the entire sequence 
was sampled. The number of occurrences of each such 
rt-tuple was counted, then the n-tuples were ranked 
from highest to lowest frequency of occurrence. 

The Zipf plot of word frequency vs. rank has 
power-law behavior with exponent £ = — 1 for natural 
languages. For non-coding DNA ( ranged from 0.289 
to 0.537. |[| Figure 1 shows the Zipf plot for noise 
with various correlation exponents, (3, for 6-tuples 
from 72k data point sequences. The data cleanly fit 
a power-law over about three decades, similar to or 
better than the non-coding DNA results of ref. 1 . A 
monotonic increase in £ with increasing f3 is observed 
as shown in the inset. The power-law scaling breaks 
down for [3 ~ 1 or larger but is recovered when a 
larger alphabet is used. || Also shown are the redun- 
dancy percentages, R, for the noise, calculated as in 
ref. 1. The R values are similar to those in ref. 1 with 
similar z values. 

The best fit to a power-law in a Zipf plot in ref. 1 
(fig. 1) is from mammalian DNA with £ = 0.289. 
The correlation exponents found for certain mam- 
malian primarily non-coding DNA were in the range 
0.64 < a < 0.71 which give 0.28 < b < 0.42. § By 
comparison we find ( = 0.28 for noise with (3 = 0.30. 
This together with the fact that coding DNA has 
(3 » C ss are consistent with the idea that power-law 
correlations without a linguistic component could ac- 
count for the behavior reported in ref. 1. Our results 
demonstrate that the Zipf power-law scaling and non- 
zero Shannon redundancies alone must not be relied 
upon to distinguish language from noise. However, a 
detailed comparison of Zipf exponents for DNA, lan- 
guage, and noise with the same correlation exponent 
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might be revealing. A full account of these findings 
will appear elsewhere. || 
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Figure Caption 

Zipf plot, exponents, and redundancy % for 6-tuplcs 
from power-law noise 
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